Skip to content

Add RT acceleration structure abstraction with size queries and resource allocation#1232

Merged
MarijnS95 merged 1 commit into
llvm:mainfrom
Traverse-Research:rt-as-abstraction
Jun 6, 2026
Merged

Add RT acceleration structure abstraction with size queries and resource allocation#1232
MarijnS95 merged 1 commit into
llvm:mainfrom
Traverse-Research:rt-as-abstraction

Conversation

@MarijnS95

@MarijnS95 MarijnS95 commented May 27, 2026

Copy link
Copy Markdown
Collaborator

For #1158

Introduce the foundational types for ray tracing acceleration structures:

  • Abstract AccelerationStructure base class
  • Geometry/instance descriptors and BLAS/TLAS build-request structs with size queries
  • AS resource allocation across DX12, Vulkan, and Metal backends

Recording the actual build commands lands in a follow-up commit on top of the ComputeEncoder abstraction.

DX12 device interface bump

ID3D12DeviceX typedef goes from ID3D12Device2 to ID3D12Device5, so the existing Device member directly exposes GetRaytracingAccelerationStructurePrebuildInfo (and the eventual CreateStateObject / SetPipelineState1 for the PSO RT epic) — no separate Device5 member or post-create QueryInterface dance. D3D12CreateDevice is already invoked with IID_PPV_ARGS(&Device), so the bump naturally requires the adapter to support the Device5 interface (Win10 1809+); RT-capable hardware is selected by the acceleration-structure lit feature regardless.

Vulkan device-creation refactor

Single vkGetPhysicalDeviceFeatures2 call: every extension feature struct we care about (atomic-int64, mesh-shader, acceleration-structure, BDA on 1.1) is chained into pNext before the query. Post-query we verify each extension's gating feature bool and clear the sub-features we don't need (capture-replay, indirect-build, multiview, etc.).

Drive-by: rather than letting vkCreateDevice reject the device with a generic VK_ERROR_FEATURE_NOT_PRESENT, the code returns a descriptive llvm::Error naming the extension and the bool that came back zero — pinpointing the case where a driver advertises an extension but reports its base feature as VK_FALSE. The duplicate queryDeviceExtensions call is gone, and EnabledDeviceExtensions is now one list (the previous EnabledExtensions separate vector silently dropped mesh-shader / atomic-int64 entries when RT was enabled).

@MarijnS95 MarijnS95 force-pushed the rt-as-abstraction branch from 3bfe062 to fd17d69 Compare May 27, 2026 14:28
@MarijnS95 MarijnS95 force-pushed the rt-as-abstraction branch 2 times, most recently from f6ae856 to 7527088 Compare May 28, 2026 16:26
@MarijnS95 MarijnS95 force-pushed the rt-as-abstraction branch from 7527088 to 546235a Compare May 29, 2026 10:01
@MarijnS95 MarijnS95 marked this pull request as ready for review May 29, 2026 10:01
@MarijnS95 MarijnS95 force-pushed the rt-as-abstraction branch from 546235a to 3596fcc Compare May 29, 2026 21:02
Comment thread include/API/AccelerationStructure.h Outdated

@EmilioLaiso EmilioLaiso left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Comment thread include/API/AccelerationStructure.h Outdated
Comment thread lib/API/DX/Device.cpp Outdated
Comment thread lib/API/DX/Device.cpp Outdated
Comment thread lib/API/DX/Device.cpp Outdated
Comment thread lib/API/VK/Device.cpp
VkBuffer Buffer;
VkDeviceMemory Memory;
VkDeviceAddress DeviceAddress;
PFN_vkDestroyAccelerationStructureKHR FnDestroyAS;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we group all the acceleration structure extensions functions together in a struct instead?

@MarijnS95 MarijnS95 Jun 3, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AS extension function pointers are already grouped:

struct RaytracingFunctions {
PFN_vkCreateAccelerationStructureKHR CreateAS = nullptr;
PFN_vkDestroyAccelerationStructureKHR DestroyAS = nullptr;
PFN_vkGetAccelerationStructureBuildSizesKHR GetBuildSizes = nullptr;
PFN_vkGetAccelerationStructureDeviceAddressKHR GetDeviceAddress = nullptr;
};

The lone PFN_vkDestroyAccelerationStructureKHR held on VulkanAccelerationStructure here is the only one needed for cleanup of an AS handle; not worth dragging the whole struct in for a single function.

@MarijnS95 MarijnS95 force-pushed the rt-as-abstraction branch 2 times, most recently from bda11a5 to 4e7f57e Compare June 3, 2026 11:47
@MarijnS95 MarijnS95 requested a review from manon-traverse June 3, 2026 13:03
@MarijnS95 MarijnS95 force-pushed the rt-as-abstraction branch from 4e7f57e to 824e33c Compare June 3, 2026 14:58
Comment thread lib/API/VK/Device.cpp Outdated
…rce allocation

Introduce the foundational types for ray tracing acceleration structures:
abstract `AccelerationStructure` base class, geometry/instance
descriptors, BLAS/TLAS build-request structs with size queries, and AS
resource allocation across DX12, Vulkan, and Metal. Recording build
commands lands in a follow-up commit on top of the ComputeEncoder
abstraction.

DX12: `ID3D12DeviceX` typedef bumps from `ID3D12Device2` to
`ID3D12Device5`, so the existing `Device` member directly exposes
`GetRaytracingAccelerationStructurePrebuildInfo` (and the eventual
`CreateStateObject` / `SetPipelineState1` for the PSO RT epic) — no
separate `Device5` member or post-create `QueryInterface` dance.
`D3D12CreateDevice` is already invoked with `IID_PPV_ARGS(&Device)`, so
the bump naturally requires the adapter to support the Device5
interface (Win10 1809+); RT-capable hardware is selected by the
`acceleration-structure` lit feature regardless.

Vulkan device creation switches to a single `vkGetPhysicalDeviceFeatures2`
call covering every extension feature struct we care about (atomic-int64,
mesh-shader, acceleration-structure, BDA on 1.1): each struct is chained
into `pNext` before the query, and post-query we verify the gating bool
and clear the sub-features we don't enable (capture-replay,
indirect-build, multiview, etc.).

Drive-by: rather than letting `vkCreateDevice` reject the device with a
generic `VK_ERROR_FEATURE_NOT_PRESENT`, the code now returns a
descriptive `llvm::Error` naming the extension and the bool that came
back zero — pinpointing the case where a driver advertises an extension
but reports its base feature as `VK_FALSE`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MarijnS95 MarijnS95 force-pushed the rt-as-abstraction branch from 824e33c to a7cdb63 Compare June 5, 2026 23:09
@MarijnS95 MarijnS95 merged commit f724b62 into llvm:main Jun 6, 2026
22 of 26 checks passed
@MarijnS95 MarijnS95 deleted the rt-as-abstraction branch June 6, 2026 08:00
MarijnS95 added a commit that referenced this pull request Jun 11, 2026
Closes #1158 🥳

## Summary
Wire up acceleration-structure descriptor binding end-to-end across all
three backends so shaders can actually consume the TLAS that
`buildPipelineAccelerationStructures()` produced — completing the stack
and promoting the three InlineRT tests from XFAIL to passing.

Per-resource AS handling lands in a new per-backend `createAS()` (paired
with `createSRV()` / `createUAV()` / `createCBV()`): a pure
single-create that queries TLAS sizes via `Dev.getTLASBuildSizes()` and
allocates the handle via `Dev.createTLAS()`, returning the `unique_ptr`
to the caller. No `InvocationState` or `Pipeline` access — the
multi-create (`createBuffers()` / `createResources()`) records the
handle in `InvocationState::TLASes` (a `StringMap` keyed by
`TLASDesc::Name`) and wires a non-owning AS pointer into the
per-resource bundle the binding loop reads. The shared AS-build helper
picks up that map and walks `P.AccelStructs.TLAS` to pair each YAML
descriptor with its pre-allocated handle by name (TLASes without a map
entry are skipped, i.e. declared but unbound). BLAS handles are still
allocated by the helper itself since BLASes aren't user-bindable.

`executeProgram()` in each backend now runs as:

- `createBuffers` / `createResources` (`createAS()` allocates TLAS
handles)
- open encoder → `buildPipelineAccelerationStructures()` → end

- **Vulkan**: `createDescriptorPool()` counts AS descriptors in a
separate scalar (the KHR enum value `1000150000` doesn't fit in the
indexed array used for the core types) and emits one
`VkDescriptorPoolSize` for them. `createDescriptorSets()` reads the
resolved `VulkanAccelerationStructure` handle from `ResourceRef.AS`
(populated by `createResources()`) and writes it through a
`VkWriteDescriptorSetAccelerationStructureKHR` chained on the descriptor
write's `pNext`. The dispatch's pre-barrier dst access now includes
`VK_ACCESS_ACCELERATION_STRUCTURE_READ_BIT_KHR` so the prior AS-build's
writes are visible to the shader's RayQuery reads. Device creation
enables `VK_KHR_ray_query` using the same chain-pre-query +
error-on-flag-mismatch pattern that #1232 set up for the AS / BDA
extensions — without `VK_KHR_ray_query` enabled the shader's
`OpRayQueryProceedKHR` instructions silently no-op and `Output` reads
back zero. `copyResourceDataToDevice()` short-circuits AS bundles via a
new `ResourceBundle::isAccelerationStructure()` predicate (no host
buffer to barrier).
- **DX12**: writes a
`D3D12_SRV_DIMENSION_RAYTRACING_ACCELERATION_STRUCTURE` SRV with the AS
GPU virtual address as `Location` into the heap slot that
`createBuffers()` reserved (`CreateShaderResourceView()` with a null
resource — the AS data lives in the buffer pointed to by `Location`).
- **Metal**: the Metal shader converter doesn't bind the AS directly;
the shader reads a buffer containing an
`IRRaytracingAccelerationStructureGPUHeader` that holds the AS's
`gpuResourceID` plus a pointer to an instance-contributions array.
`createBuffers()` allocates and fills both buffers per AS-descriptor
entry, then points the descriptor at the header buffer's GPU address.
The TLAS itself is built with the `UserID` instance-descriptor variant
so HLSL `CommittedInstanceID()` returns the YAML-specified per-instance
ID instead of the array index.

The three InlineRT tests now actually exercise the AS end-to-end:
`TraceRayInline()` issues a RayQuery against `Scene` and writes a
hit-dependent value into `Output` (the instance ID for `multi-instance`,
1/0 otherwise). The catch-all `XFAIL: *` is dropped; `XFAIL: Clang`
remains. The test shaders also gain explicit `[[vk::binding]]`
annotations because dxc's default HLSL→SPIR-V binding mapping collides
`Scene`'s `t0` with `Output`'s `u0` at binding 0, which VVL flags as a
descriptor type mismatch.

## Test plan

Local on an NVIDIA RTX 3060:
- [x] Linux Vulkan (native `offloader`)
- [ ] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`)
- [ ] Windows Vulkan (native `offloader.exe`)
- [ ] Windows D3D12 (native `offloader.exe`)

CI (RT-capable runners):
- [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`)
- [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`)
- [ ] macOS Metal (`supportsRaytracing`)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso pushed a commit that referenced this pull request Jun 15, 2026
## Summary

Stacks on top of #1232 and #1245 to add five small InlineRT tests, each
isolating one `RayQuery` method on the existing single-triangle BLAS:

- `miss-status.test` — `COMMITTED_NOTHING` path (ray points away from
geometry)
- `ray-t.test` — `CommittedRayT()` returns exact `1.0` for the
axis-aligned hit
- `barycentrics.test` — `CommittedTriangleBarycentrics()` at world
`(0,0,0)` returns exactly `(0.25, 0.25)`
- `world-ray-echo.test` — `WorldRayOrigin` / `WorldRayDirection` /
`RayTMin` / `RayFlags` round-trip into a structured buffer; passes
`-fvk-use-dx-layout` so SPIR-V matches DXIL's tight `float3` packing and
the expected bytes are portable across DX / VK / MTL.
- `tmin-tmax-clip.test` — two queries against the same BLAS: one with
`TMin` past the hit, one with `TMax` before it; both must miss.

First batch out of #1258 (inline-RT test coverage epic) — the easiest
wins, no framework / YAML changes required.

## Test plan

Local on an NVIDIA RTX 3060:
- [x] Linux Vulkan (native `offloader`)
- [x] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`)
- [x] Windows Vulkan (native `offloader.exe`)
- [ ] Windows D3D12 (native `offloader.exe`)

CI (RT-capable runners):
- [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`)
- [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`)
- [x] macOS Metal (`supportsRaytracing`)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso pushed a commit that referenced this pull request Jun 15, 2026
## Summary

Stacks on top of #1232 / #1245. Adds the first InlineRT test with a
non-trivial BLAS layout — three triangles tiled along x at `x = -4, 0,
+4` — and a 3-lane dispatch that fires one ray per lane straight down at
its own triangle. Each lane's `CommittedPrimitiveIndex()` must equal its
lane index. Also exercises divergent rays per thread for free.

Seed test for the multi-primitive / multi-geometry BLAS bullets in the
inline-RT coverage epic (#1258).

Independent of the other InlineRT test PRs (#1271, #1274) — only adds a
new test file.

## Test plan

Local on an NVIDIA RTX 3060:
- [x] Linux Vulkan (native `offloader`)
- [x] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`)
- [ ] Windows Vulkan (native `offloader.exe`)
- [ ] Windows D3D12 (native `offloader.exe`)

CI (RT-capable runners):
- [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`)
- [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`)
- [x] macOS Metal (`supportsRaytracing`)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso pushed a commit that referenced this pull request Jun 15, 2026
## Summary

Stacks on top of #1232 / #1245. Three TLAS instances at `x = -5, 0, +5`
with `InstanceMask` values `0x01` / `0x02` / `0x04` and `InstanceID`s
`0` / `1` / `2`. A 3-lane dispatch fires one ray per lane straight down
at its own instance column, but every ray uses `InstanceInclusionMask =
0x02` — so only the middle instance survives the mask test. Lane 1
reports `InstanceID = 1`; lanes 0 and 2 miss.

Covers the `InstanceInclusionMask` filtering bullet in the inline-RT
coverage epic (#1258).

Independent of the other InlineRT test PRs (#1271, #1272) — only adds a
new test file.

## Test plan

Local on an NVIDIA RTX 3060:
- [x] Linux Vulkan (native `offloader`)
- [x] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`)
- [ ] Windows Vulkan (native `offloader.exe`)
- [ ] Windows D3D12 (native `offloader.exe`)

CI (RT-capable runners):
- [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`)
- [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`)
- [x] macOS Metal (`supportsRaytracing`)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso pushed a commit that referenced this pull request Jun 15, 2026
## Summary

Stacks on top of #1232 / #1245. Two rays at the existing single-triangle
BLAS — one from +z (sees the front face per the default winding
convention all three backends share) and one from -z (sees the back
face) — with the `RAY_FLAG_CULL_BACK_FACING_TRIANGLES` template flag
set. Lane 0 must hit and lane 1 must miss.

Independent of the other InlineRT test PRs (#1271, #1272, #1274) — only
adds a new test file.

## Test plan

Local on an NVIDIA RTX 3060:
- [x] Linux Vulkan (native `offloader`)
- [x] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`)
- [ ] Windows Vulkan (native `offloader.exe`)
- [ ] Windows D3D12 (native `offloader.exe`)

CI (RT-capable runners):
- [x] windows-nvidia D3D12 (`RaytracingTier 1.2`)
- [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`)
- [x] macOS Metal (`supportsRaytracing`)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
EmilioLaiso added a commit that referenced this pull request Jun 17, 2026
Depends on #1245

## Summary

Foundational PR in the PSO-based raytracing bring-up series tracked in
#1268. Stacks on top of #1245 (which depends on #1244, which depends on
#1232) — only the top commit on this branch is new; the rest are the
inline-RT bring-up already in review.

Lays out the framework-side surface needed by the upcoming backend PRs:

- `ShaderPipelineKind::RayTracing` plus six new `Stages` —
`RayGeneration`, `Miss`, `ClosestHit`, `AnyHit`, `Intersection`,
`Callable` — with `isRayTracingStage` / `Pipeline::isRayTracing()`
helpers.
- YAML schema for an RT pipeline: `HitGroup` (Triangles | Procedural,
ClosestHit + optional AnyHit / Intersection), `RayTracingPipelineConfig`
(MaxTraceRecursionDepth, MaxPayloadSizeInBytes, MaxAttributeSizeInBytes,
optional PipelineFlags), and `ShaderBindingTable` (raygen / miss /
hit-group / callable records, each with optional reserved LocalRootData
bytes).
- `validatePipelineKind` allows duplicate RT stages (a pipeline can have
several miss / hit-group shaders, which the existing duplicate check
would have rejected), requires at least one RayGeneration, and rejects
mixing with Compute/Vertex/Mesh. The reverse check rejects HitGroups /
RTConfig / SBT on any non-RT pipeline. `validateDispatchParameters`
reinterprets `DispatchGroupCount` as `{Width, Height, Depth}` for the
upcoming DispatchRays and forbids VertexCount on RT.
- Existing `Stages` switches across the backends grow the six RT cases —
Vulkan maps each one to its `VK_SHADER_STAGE_*_KHR` bit ready for PR 2;
Metal unreachables on RT (`metal_irconverter` takes a different route);
raster pipeline `setShader` (Traditional + MeshShader variants) adds
them to the existing unreachable group.
- Each backend's `executeProgram` gets a terminal `else if
(P.isRayTracing())` that returns a "not yet supported on <backend>"
error so PR2/3/4 just have to replace it.
- `%dxc_target_lib` lit substitution (same compiler binary, separate
name for `-T lib_6_x` library targets); `raytracing-pipeline`
available-feature gated on DX `RaytracingTier >= 1.0` and the Vulkan
`VK_KHR_ray_tracing_pipeline` extension being reported by the device.
- Foundational `test/Feature/RT/raygen-roundtrip.test` exercising the
full schema (raygen+miss+CH, BLAS/TLAS, HitGroups, RTConfig, SBT). Gated
on `raytracing-pipeline` and `XFAIL: *` until each backend bring-up
lands.

## Test plan

Local on an NVIDIA RTX 3060:
- [x] Linux Vulkan (native `offloader`)
- [ ] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`)
- [ ] Windows Vulkan (native `offloader.exe`)
- [ ] Windows D3D12 (native `offloader.exe`)

CI (RT-capable runners):
- [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`)
- [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`)
- [x] macOS Metal (`supportsRaytracing`)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: EmilioLaiso <emilio@traverseresearch.nl>
manon-traverse pushed a commit that referenced this pull request Jun 18, 2026
## Summary

First per-backend bring-up in the PSO raytracing series (#1268). Stacks
on top of #1270 (foundational schema + lit infrastructure + XFAILed
test). Adds the API surface needed by the upcoming D3D12 and Metal PRs
plus the Vulkan implementation behind it.

API surface:

- `ComputeEncoder::dispatchRays(PSO, SBT, W, H, D)` virtual on the
existing compute encoder (no separate `RayTracingEncoder`).
- `Device::createPipelineRT` + `Device::createShaderBindingTable`
virtuals with a new `RayTracingPipelineCreateDesc` carrying the DXIL
library blob, the shader entry points (Stage + EntryPoint), the
hit-group list, and the `RayTracingPipelineConfig`.
- `include/API/ShaderBindingTable.h` holding the abstract runtime base;
backend SBT classes derive from it with LLVM-style `classof` / `cast<>`.
- Rename: PR #1270's YAML struct `ShaderBindingTable` →
`ShaderBindingTableDesc` so the bare name is free for the runtime class
(parallel to `BLASDesc` / `TLASDesc` vs `AccelerationStructure`). YAML
key stays `ShaderBindingTable:`.
- D3D12 and Metal stub the new methods with not-yet-supported errors;
their bring-up lands in follow-up PRs.

Vulkan implementation:

- The pre-existing `RaytracingFunctions RT` struct lumped AS and
RT-pipeline entry points together; they split into `ASFunctions AS` +
`RTPipelineFunctions RT` so the names match the actual feature-gate
split (AS + ray-query is a complete configuration; RT pipeline layers on
top). `HasRayTracingSupport` renames to `HasASSupport`;
`HasRTPipelineSupport` tracks the new extension.
- `VK_KHR_ray_tracing_pipeline` is requested when reported, with
`VkPhysicalDeviceRayTracingPipelineFeaturesKHR` chained pre-query and
the gating `rayTracingPipeline` bool checked post-query (matches the AS
/ BDA pattern from #1232). Sub-features the tests don't exercise
(capture-replay / indirect-trace / traversal-primitive-culling) are
cleared.
- Function pointers `vkCreateRayTracingPipelinesKHR`,
`vkGetRayTracingShaderGroupHandlesKHR`, `vkCmdTraceRaysKHR` resolve once
at device creation. `VkPhysicalDeviceRayTracingPipelinePropertiesKHR` is
cached at the same time for SBT handle size / alignment / base
alignment.
- `VKRayTracingPipelineState` derives from `VulkanPipelineState`; an
`IsRayTracing` flag on the base lets the existing Vulkan `cast<>` path
stay polymorphic without adding a new `GPUAPI` value. The derived class
also carries a `StringMap<uint32_t>` resolving each shader `EntryPoint`
or hit-group `Name` to its index in the pipeline's group array, plus
per-bucket counts so the SBT builder can slice the contiguous handle
blob into raygen / miss / hit / callable regions.
- `createPipelineRT` builds a single `VkShaderModule` (the DXIL library
compiles to one SPIR-V module with multiple `OpEntryPoint`s), one
`VkPipelineShaderStageCreateInfo` per `Shader` entry, and one
`VkRayTracingShaderGroupCreateInfoKHR` per general shader / hit group.
Pipeline layout uses the same `createPipelineLayout` helper as the
compute path, gated on all six RT stage flags so any binding can be
consumed from any RT shader.
- `createShaderBindingTable` allocates a host-visible coherent buffer
big enough for four regions, then lays out each entry as `[handle
bytes][LocalRootData bytes][padding-to-stride]`. Per-region stride =
`align(handleSize + max-LocalRootData-in-region, handleAlignment)`;
per-region size = `align(count * stride, baseAlignment)`. LocalRootData
support comes for free from PR #1270's SBT schema; the test doesn't
exercise it yet. Each region's `VkStridedDeviceAddressRegionKHR` derives
from the buffer's `vkGetBufferDeviceAddress`.
- `dispatchRays` binds the pipeline at
`VK_PIPELINE_BIND_POINT_RAY_TRACING_KHR`, emits a pre-barrier with
`ACCELERATION_STRUCTURE_READ_BIT_KHR | SHADER_READ_BIT |
SHADER_WRITE_BIT` dst access into `RAY_TRACING_SHADER_BIT_KHR`, then
calls `vkCmdTraceRaysKHR` with the SBT's four region structs.
- `createCommands` picks the new bind point for RT pipelines so
`vkCmdBindDescriptorSets` binds to the right point. `executeProgram`'s
`isRayTracing` branch builds a `RayTracingPipelineCreateDesc` from the
`Pipeline`, calls `createPipelineRT` then `createShaderBindingTable`,
and keeps both on `InvocationState` for the dispatch.

Test side: `raygen-roundtrip.test`'s `XFAIL` becomes `Clang, DirectX,
Metal`. On a DXC + Vulkan combo with the device reporting
`VK_KHR_ray_tracing_pipeline` this should PASS; the Clang token still
catches the compile failure on the Linux + `clang-dxc` loop where
`[shader("raygeneration")]` doesn't yet lower to SPIR-V.

## Test plan

Local on an NVIDIA RTX 3060:
- [x] Linux Vulkan (native `offloader`)
- [ ] Linux D3D12 (Wine + vkd3d-proton + cross-compiled `offloader.exe`)
- [ ] Windows Vulkan (native `offloader.exe`)
- [ ] Windows D3D12 (native `offloader.exe`)

CI (RT-capable runners):
- [ ] windows-nvidia D3D12 (`RaytracingTier 1.2`)
- [ ] windows-intel VK (`VK_KHR_ray_tracing_pipeline`)
- [x] macOS Metal (`supportsRaytracing`)

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: EmilioLaiso <emilio@traverseresearch.nl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants